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ABSTRACT 

We present the X-ray pipeline developed for the purpose of the cluster search in the 
XMM-LSS survey. It is based on a two-stage procedure via a dedicated handling of 
the Poisson nature of the signal: (1) source detection on multi-resolution wavelet 61- 
' tered images; (2) source analysis by means of a maximum likelihood 6t to the photon 

images. The source detection efflciency and characterisation are studied through exten- 
q ■ sive Monte-Carlo simulations. This led us to de6ne two samples of extended sources: 

$H ' the CI class that is uncontaminated, and the less restrictive C2 class that allows for 

, 50% contamination. The resulting predicted selection function is presented and the 

d • comparison to the current XMM-LSS con6rmed cluster sample shows very good agree- 

ment. We arrive at average predicted source densities of about 7 CI and 12 C2 per 
deg 2 , which is higher than any available wide held X-ray survey. We 6nally notice a 
■ substantial deviation of the predicted redshift distribution for our samples from the 

' one obtained using the usual assumption of a flux limited sample. 

CZ ' 

Key words: surveys - X-ray: general - methods: data analysis - X-ray: galaxies: 
clusters - large-scale structure of Universe 



1 INTRODUCTION 

X-ray imaging is recognized to be one of the most sensi- 
tive and reliable methods to detect galaxy clusters. The 
main reason for this comes from the extended nature of 
the cluster emission, whose intensity is closely related to 
the depth of the associated potential well. Moreover, at 
high galactic latitude and medium-deep X-ray sensitivity 
(KT 14 - 1(T 15 erg s" 1 cm" 2 in the [0.5-2] keV band), the 
mean source density 1 is much lower than in the optical or 
NIR wavelengths. Both aspects concur to significantly lower 
projection effects that become critical in the optical bands 
above z > 0.5. The task of discovering and characterizing 

* E-mail: pacaud@discovery.saclay.cca.fr 
f Present address: ESA, Villafranca del Castillo, Spain 
1 100 to 800 sources per deg 2 of which about one tenth is ex- 
tended 



X-ray clusters is, however, complicated by the Poisson na- 
ture of X-ray data combined with several instrumental ef- 
fects (PSF, vignetting, CCD patterns, X-ray and particle 
background) that have to be disentangled from the intrinsic 
emission profile of the sources. 

With its mosaic of overlapping 10 4 s XMM pointings, 
the XMM-LSS survey has been designed to detect a signif- 
icant fraction of the cluster population out to z=l, over an 
area of several tens of deg 2 , so as to constitute a sample 
suitable for cosmological studies JPierre et alj|2004l) . Trade- 
off in the survey design was depth versus coverage, keep- 
ing within reasonable limits the total observing time. The 
two major requirements of the X-ray processing were thus 
to reach the sensitivity limit of the data in a statistically 
tractable manner in terms of cluster detection efficiency, 
and to subsequently provide the selection function of the 
detected objects. 



2 F. Pacaud et al. 



To achieve these goals, it was necessary to design a new 
two-step X-ray pipeline, combining wavelet multi-resolution 
analysis and maximum likelihood fits, both using Poisson 
statistics. This substantial development was required, as our 
controlled tests on simulated cluster fields revealed unsatis- 
factory performances for extended sources using the early 
versions of the official detection software provided b y the 
XMM-SAS 2 f see IValtchanov. Pierre fc Gastaudll200ll). Our 
appro ach follows the principles pioneered bv lVikhlinin et alJ 
( 1998) which were originally established for the ROSAT 
PSPC data, and that we have totally revised to optimally 
handle the complex XMM instrumental characteristics. 

The present paper provides a detailed description of 
our pipeline - a two year effort - , of its performances and 
of the resulting computation of the selection function both 
for point-like and extended sources. Section |5] describes the 
various steps and parameters of the procedure. Section [3] 
presents a global evaluation of the pipeline using Monte- 
Carlo image simulations. These are in turn used to define a 
system of classes for cluster candidate sources, allowing for 
various degrees of completeness or contamination. Finally, 
in sectional we present a case study for the computation of 
the survey selection function, relying on the pipeline source 
classification, in a standard cosmological context. 



2 PIPELINE DESCRIPTION 

The pipeline proceeds in three steps: 

(i) Starting from raw observation data files (ODFs), cal- 
ibrated event lists are created using the XMM-SAS tasks 
emchain and epchain. These are then filtered for solar soft 
proton flares and used to produce images. 

(ii) The images are filtered in wavelet space, then scanned 
by a source detection algorithm set to a very low threshold 
to obtain a primary source list. 

(iii) Detailed properties of each detected source is as- 
sessed from the photon images using Xamin, a maximum 
likelihood profile fitting procedure. This package was de- 
signed for the purpose of the XMM-LSS survey, with the 
specific goal of monitoring in a clean and systematic way 
the characterization of extended X-ray sources and associ- 
ated selection effects. 



2.2 Source detection 

In order to maximize detection rates, and provide good in- 
put to the maximum likelihood fit for both point-like and 
extended sources within an acceptable computation time, 
we follow the prescription of IValtchanov et al] (1200 1]) . ex- 
tensively tested over numerical simulations, to use a mixed 
approach combining wavelet filtering of the images and de- 
tection with a procedure initially developed for optical im- 
ages (SExtractor). 

In each band, the 3 EPIC detector images are co- 
added and the resulting image is filtered using the wavelet 
task mr_filter from the multi -resolution package MR/1 
iStarck. Murtaeh fc Biiaouilll998T) . This task incorporates a 
statistically rigorous treatment of the Poisson noise which 
enables the removal of unsignificant signal directly in the 
wavelet space using a thresholding algorithm. A subsequent 
iterative image reconstruction process accurately recovers 
the flux and shape of the relevant structures contained in 
the data. The details of the procedure and an evaluation 
of its ability to properly rec onstruct faint sources i n the 
Poisson regime are giv en in IStarck fc Pierre] (I1998T) and 
IValtchanov et al.l feOOlb . 

The pri mary source catalogues ar e then derived running 
SExtractor dBertin fc Arnoutall996r) on the filtered image. 
The use of this software is now possible because the multires- 
olution filtering has removed most of the noise and produced 
a smoothed background. To avoid border effects we restrict 
our analysis to the inner 13'of the field 3 . With our current 
settings, the software essentially proceeds in four steps. First 
the background level is iteratively estimated in image cells 
by 3cr clipping and a full-resolution background map is con- 
structed by bicubic-spline interpolation. Sources are then 
identified as groups of adjacent pixels matching an inten- 
sity level. The software subsequently tries to split blended 
sources by re-thresholding at some sublevels between the 
original threshold and the peak value of each source and 
looking for features containing a significant amount of the 
flux in the emission profile. Finally a detailed analysis of the 
source is performed: isophotal analysis to determine source 
position and shape, a nd photomet ry i n a flexible elli ptical 
aperture as defined in iKronl Jl98d) and llnfantd ]l987t) . 

Parameters of the source detection steps are summa- 
rized in Table 



2.1 Image extraction 

Once event lists have been created, proton flare periods are 
filtere d following the method proposed bv lPratt fc Arnaudl 
i2002T) . i.e. using the light curves of high-energy events (10- 
12 keV for MOS, 12-14 keV for PN). Histograms of each light 
curve, binned by 104 seconds, are produced and fitted by a 
Poisson law to determine the mean of the distribution, A. We 
then apply a 3a threshold, so that time intervals where the 
emission exceeds A + 3*\/A are thrown out as contaminated. 

Images of 2.5"/pfx containing single and double events 
are then produced using evselect in each of the 5 energy 
bands: [0.3-0.5], [0.5-2], [2-4.5], [4.5-10] and [2-10] keV. 



2.3 Source validation and characterization: Xamin 

At the end of the pipeline processing, all the sources detected 
by SExtractor are analyzed by Xamin using the binned pho- 
ton images. 

For each source, Xamin determines a model that maxi- 
mizes the probability of generating the observed spatial pho- 
ton distribution. First, a point source model is tested, then 
an extended source profile parametrized by a spher ically 
symmetric /3-model dCavaliere fc Fusco-Femianolll976l) : 



Sx(r) oc 
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(0 
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(1) 



2 XMM Science Analysis System, |http: / /xmm.v ilspa.esa,e s/sas/| 
for subsequent data analysis we used v6.1 of this package 



3 The centre of the pointing is computed as a sensitivity-weighted 
average of the optical axis positions of the three telescopes, taken 
from the exposure map header keywords XCEN and YCEN 
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Table 1. Relevant parameters of the XMM-LSS pipeline detec- 
tion stage. Note that the high SExtractor detection threshold 
does not imply that we are being restrictive, but rather reflects 
the fact that the software is run on already adaptively smoothed 
images. 



Parameter 


value 


Event selection: 




MOS event flag selection 


#XMMEA_EM 


PN event flag selection 


(FLAG & 0x2fb002c)==0 


MOS patterns 


[0:121 

L J 


PN patterns 


[0:4] 


Image: 




Type 


sky 


Configuration 


co-addition of EPIC detectors 


Pixel size 


2.5" 


MR/1: 




Wavelet type 


B-splinc 


Transform algorithm 


"a trou" 


Poisson noise threshold 


1CT 3 (~ Gaussian 3.1a) 


Lowest significant scale 


2 pix. 


Highest significant scale 


256 pix. 


SExtractor: 




Background cell side 


64 pix. 


Background median filtering 


4 cells 


Detection threshold 


6<T 


Detection minimum area 


12 pix. 


Deblending sub-thresholds 


64 


Deblend min. contrast 


0.003 



convolved with the XMM point spread function. As we gen- 
erally don't have enough S/N with our data to estimate 
simultaneously both r c and f3 (especially with a 2D fit, we 
decided to fix /3 to the canonical value of 2/3 that is widely 
used to model the X-ray emission profile of massive galaxy 
clusters. Similarly, fitting more sophisticated models (e.g. 
elliptical) is not justified. The best fit parameters for both 
models are listed in output along with relevant parameters 
characterizing the source (see list of Table |2j . 

2.3.1 Likelihood model 

The statistic used to assess the reliabili ty of a give n model 
is a simplified version of the C-statistic fcashlll979l) : 

C = 2 ^ (rrii - yi In mi) , (2) 

i=l 

where yi is the number of observed photons in pixel i, and 
m,i is the model value in that same pixel. In our specific 
case, the emission profile of a source is the product of its 
normalization AT mot j = ^^f 1 la! J7ij and its spatial distribution 
di, which are independent: m 4 = N mo d X di. The C-statistic 
thus reads: 

C = 2 (N mod - N data In N mod ) - 2 ( Vi In di) , (3) 

i=l 

where Ndata = X/i=a X f** Minimization of the C-statistic 
with respect to N mod directly yields N mod = N data and we 
consequently decided to fix N mod and use the statistic: 



Jv pix 

E = -2^2( Vi ]nck), (4) 

1=1 

which is equivalent to the C-statistic as far as parameter 
estimation is concerned. This formalism has the advantage 
of reducing the parameter space of the fit by one dimen- 
sion (the overall normalization). However, it should be noted 
that the normalization term 2 (N mod — N da t a In AT mo( j) that 
we have cancelled for the fit still impacts on the error bud- 
get, and has to be reintroduced while computing confidence 
ranges. 

Here, we stress that, despite the common terminology, 
the C (and E) statistics are not likelihood functions (which 
have the dimension of a probability or probability density), 
but are actually related to the true likelihood C by: 

C = -2xlog£ + B, (5) 

where B is a constant. 

As for the C-statistic, the increase of E between its best 
fit value (Eb.f.) and a model containing only background 
(i.e. uniform distribution of the photons), which is often im- 
properly referred to as 'detection likelihood', quantifies the 
significance of a detection an d is y 2 distr ibuted in the limit 
of large number of counts (see lCashlll979lb From now on, we 
refer to this parameter as the detection statistic: 

DETJ3TAT = 2N data HN plx ) - E B . F .. (6) 

Similarly, the significance of the estimated extension, can 
be assessed using an extension statistic (improperly referred 
to as 'extension likelihood') which compares the value of E 
for the best fit point-like and extended source models (once 
again \ 2 distributed in the limit of large number of counts): 

EXT^STAT = (E B F ) . t ~ (E B F ) t , ,. (7) 

\ ^-^-Ipoint V D - r ■ 'extended \ f 

The interpretation of these statistics in terms of a de- 
tection/extension probability using the \ 2 limit depends on 
the number of fitted parameters. All the statistics are thus 
ultimately converted into equivalent values that would cor- 
respond to a fit with two free parameters yielding the same 
probability. This provides a unique and well-defined link 

between our statistics and probability: for any statistic S, 
n _ s 

P = exp 2 . 

2.3.2 Source processing 

For each source, a fitting box is extracted, the size of which 
depends on the SExtractor inputs (start with 3 times the 
estimated FWHM, with the added requirement to be always 
at least 35"). The SExtractor pixel segmentation mask is 
used to flag out pixels belonging to neighbouring sources 
included in the box. This method works well both for source 
characterisation and classification in our shallow exposures 4 , 
but one would ideally have to implement a simultaneous fit 
of blended sources in very crowded fields (in development). 

The source models take into account all significant 
XMM instrumental effects: an image of the source emission 
profile is constructed 5 and normalized to the tested count 

4 we detect some 0.1 source per arcmin 2 for a PSF FWHM of 6" 

5 For both point-source and extended source profiles, we use the 
MEDIUM PSF model from the XMM calibration data which is 
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Table 2. Xamin output parameters. Notes: a listed in the cata- 
logues for both point-like and extended profile fits, b issued for 
each of the three EPIC detectors. Free parameters of the fitting 
process are written in bold font. 



Table 3. List of cluster simulations. For each cluster core radius, 
the number of simulated pointings performed for each count rate 
{Np i n t) is given, as well as the number of simulated sources per 
pointing (Ng rc ) in the central 10'. 



Parameter 



CUTRAD 
EXP 6 

GAPFLAG b 
GAPJMEIGHBOUR 

EXT 

EXT.STAT 

DET_STAT a 

XJMA,YJMA° 

RA,DEC a 

RATE_MOS a 

RATE_PN a 

SCTS_MOS a 

SCTS-PN a 

BG.MAPJVIOS a 

BG_MAP_PN a 

PLX_DEV a 

NJTER a 



Size of the fitting box 

Mean exposure time in the box 

Distance to nearest CCD gap 

Distance to nearest detected neighbour 

in the fitting box 

Best fit core radius 

Extension statistic 

Detection statistic 

Best fit position in pixel 

Best fit sky coordinates 

EPIC-MOS count rate 

EPIC-PN count rate 

Estimated source counts in MOS1+2 

Estimated source counts in PN 

Background level in MOS1+2 

Background level in PN 

Distance between input /output position 

Number of AMOEBA iterations 



Radius (") 


Count rate 




N S rc- 


10 


0.005 


10 


8 




0.01 


10 


8 




0.02 


10 


8 




0.05 


10 


8 




0.1 


10 


8 


20 


0.005 


10 


8 




0.01 


10 


8 




0.02 


10 


8 




0.05 


10 


8 




0.1 


10 


8 


50 


0.01 


15 


6 




0.02 


15 


6 




0.05 


15 


6 




0.1 


15 


6 


100 


0.02 


30 


4 




0.05 


30 


4 




0.1 


30 


4 



rate , this image is then multiplied by the exposure maps 
(taking into account vignetting, detection mask, quantum 
efficiency and the azimuthal sensitivity variations due to 
the anisotropic transmission from the Rating Grate Arrays) 
and a uniform background is added, whose level is set so as 
to match our normalization requirement (N mo d = Ndata)- 
Given the faint sources that we are analyzing, this very sim- 
ple background model is justified in absence of small scale 
variations of the XMM background in the soft bands. While 
the EPIC-PN detector is considered as an independent in- 
strument, both the EPIC-MOS detectors are assumed to 
provide the same count rate for the source and are thus 
modelled as a single detector using the summed photon im- 
age and exposure map. 

Starting from the SExtractor outputs as a first guess, 
the statisti c E is minimized using the simplex method 
AMOEBA JPress et al.lll992T) . for both the point source 
and extended emission models. It takes some 10 minutes 
for Xamin to process the average 120 detections per point- 
ing found by SExtractor. The procedure output catalogue 
comprises 29 derived parameters in addition to the 9 free 
parameters of the fits (4 for the point source model and 5 
for the extended profile). These are listed in Table |2] 



3 PERFORMANCE EVALUATION THROUGH 
MONTE-CARLO SIMULATIONS 

3.1 Description of the simulations 

To assess the quality of our data analysis, we performed 
extensive Monte-Carlo simulations of 10 4 s XMM pointings 



with the software InstSimulation ijValtchanov et alJl200lT l. 

This procedure creates images from a source list taking 
into account the main instrumental characteristics (PSF, 
vignetting, detector masks, background, Poisson noise). In 
the following, cluster searches (in simulations as well as real 
pointings) are performed in the [0.5-2] keV band, and stated 
count rates or fluxes always refer to this band. Galaxy clus- 
ter emission is indeed barely detectable at higher energies 
in our low exposure pointings, because of the combined ef- 
fect of the redshifted bremsstrahlung exponential cut-off, the 
XMM drop in sensitivity and strong particle background 
above 2 keV. 

The PSF of the simulations is obtained from the XMM 
calibration files MEDIUM model, while the azimuthally- 
averaged off-axis dependency at 1 keV is used to model the 
vignetting. When simulated, t he particle and photon b ack- 
ground levels were taken from iRead fc Ponmanl ll2003f) . In 
order to convert source fluxes to count rates, we assumed a 
constant EPIC-PN to EPIC-MOS count rate ratio regard- 
less of the source spectrum. Note that in the following we 
will always refer to count rates as the sum of MOS1, MOS2 
and PN rates after vignetting correction 7 . This means that 
for the same count rate, a source is more easily detected 
near the centre than on the border of the FOV. 

Four kinds of simulations were performed: 

• 30 pointings of 10 4 s containing only point sources. 
The flux distribution and sou rce density is c ompu ted us- 
ing the Log(N)-Log(S) from iMoretti etal] i2003T) down 
to 5 x 10 -16 erg s _1 cm~ 2 . The background values 



the only one that reproduces the strong distortions of the PSF at 
large off-axis angles 

6 For extended sources, this count rate is actually required to 
match the integral of the profile to infinity, as a significant amount 
of the source flux can fall outside the fitting box 



7 in our 10 4 s pointings, 10 — 2 cts s _1 roughly corresponds on the 
optical axis to 100 cts spread over the three EPIC detectors and a 
flux of about 9 X 10 — 15 erg s —1 cm -2 for both an AGN spectrum 
(a power law SED with spectral index T = 1.7) and a local 2 keV 
cluster (thermal bremsstrahlung) 
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from iRead fc Ponmanl i2003l) were accordingly corrected 
for the contribution of point sources fainter than 4 x 
10~ lj erg s _1 cm~ 2 (approximative flux limit of their anal- 
ysis). We assumed a random spatial distribution of the 
sources (therefore neglecting the known angular correlation 
among AGNs). 

• 250 pointings of 10 4 s containing extended sources only 
(/3-model with fixed /3=2/3) with simulated background. We 
simulated core radii of [10, 20, 50, 100] arcsec with count 
rates in the range [0.005, 0.01, 0.02, 0.05, 0.1] cts s _1 (see 
Table[3]for the exact list). Spatial distribution of the sources 
was set at random so as to cover most of the area within 10', 
with the extra requirement that sources do not overlap. 

• 250 simulations with the same extended sources as 
previously, but injected into a real XMM-NEWTON 
10 4 s pointing pertaining to the XMM-LSS (XMM Id: 
0037980501), in order to estimate how extended source char- 
acterization is affected by the point source population. 

• 18 simulations containing close pairs of point-like 
sources (separated by 20") injected into a typical real XMM- 
NEWTON pointing to test the deblending capabilities of the 
pipeline. In the first 9 simulations, 10 pairs of 3x 10~ 3 cts s _1 
were added in the pointing, while in the remaining ones, 5 
pairs of 5 x 10" 2 cts s" 1 were simulated. These simulations 
are also relevant for cluster false detection rate as blended 
point sources may be characterized as 'extended' sources. 

Examples of simulated images are given in figures [5] 
and [I] 

All simulated images were analysed through steps (ii) 
and (iii) of our pipeline (see Detected sources were then 
cross-identified with the simulation inputs using a correla- 
tion radius of 5 pixels for point sources and 15 pixels for 
extended ones. In the following subsections, we will refer 
to spurious detections as those that could not be cross- 
identified with any input source. 

3.2 Parameter estimation accuracy 

3.2.1 Extended sources 

Our simulations demonstrate that the mean photometry of 
extended sources with Xamin is satisfactory in both point- 
ings with or without point sources: it is unbiased for bright 
sources with a mean dispersion of about 20% (see fig. 0, 
while unavoidable the Eddington bias and scatter increase 
appear for fainter ones. Count rates seem somewhat over- 
estimated only for very faint or very extended sources. The 
scatter increases slowly as count rates decrease but also with 
increasing radius. 

Even when the clusters are injected into real point- 
ings, the performances remain correct up to 50"core radius. 
Knowing that, for a physical core radius of 180/i^ kpc, the 
apparent core radii span the range 55" — 22" for 0.2 < z < 1 
in ACDM, the goals of the pipeline are fully met. For very 
faint sources (5 x 10 -3 cts s _1 ), the rates are somewhat un- 
derestimated, which can be explained by the fact that only 
the central brightest part of the sources clearly emerges from 
the background fluctuations. Above 50"core radii, the rates 
are somewhat overestimated. In addition, we notice surpris- 
ingly a weak increase of the detection efficiency of these 
sources when adding AGNs. The simplest interpretation is 
that part of the very extended sources found their emission 



contaminated by faint AGNs (that fall below our detection 
or deblending capacity), and thus tend to pass more eas- 
ily the detection criteria of the pipeline, but with erroneous 
photometry and core radius. 

A second point to note comparing the left and right 
panels of figure 2] is that the photometry seems tightly cor- 
related with extension measure accuracy. A poor modelling 
of the source emission profile logically yields incorrect count 
rate estimates, particularly for very extended sources where 
the flux is extrapolated far outside of the fitting box. An 
inaccurate estimate of the source extensions is thus proba- 
bly the reason for both the low count-rate and the high core 
radius photometry bias identified above. 



3.2.2 Point sources 

As shown in figure Qj, the point source photometric disper- 
sion is basically comparable to the spread due to Poisson 
noise down to 20 cts. At the faint tail of the distribution, a 
strong Eddington bias appears. 

Another issue regards point source confusion. We used 
our set of close pair simulations to test the deblending effi- 
ciency. The results are quite satisfactory: all 5 x 10~ 2 cts s _1 
pairs are deblended, while more than 65% of the 3 x 
10~ 3 cts s" 1 sources are also. This success rate cannot eas- 
ily be reached by first step detection procedures based on 
sliding cells having a minimum size of 10" . This point is not 
only important for point source statistics, but also for clus- 
ter detection, in order not to consider blended point sources 
as a single extended one. 



3.3 Source classification using the simulations 

Source selection and estimation of the selection function in 
surveys is always a complicated task and results from the 
necessary trade-off between sample completeness and con- 
tamination. For this purpose, we explored the Xamin output 
parameter space by means of our simulations in order to set 
well controlled extended/point-like source selection criteria, 
and to estimate contamination by spurious or misclassified 
sources. 



3.3.1 Point sources 

As AGNs represent more than 90% of the extragalactic X- 
ray sources at our sensitivity, we restrict ourselves to the 
estimation of the spurious detection rate based on our point 
source simulations. As can be seen in figure QJi., a simple 
threshold of 15 in the detection statistic gives the best 
balance between contamination and completeness: at this 
threshold, some 40 to 50 real point-like sources are detected 
in each pointing within 10' of the FOV, for only 0.5 spurious 
ones. 

The resulting detection efficiency as a function of count 
rate is shown in figure [TJl The point-source flux limit (90% 
completeness) is about 4x 10 -15 erg s _1 cm -2 in [0.5-2] keV, 
but more than 50% of the sources are detected down to 
~ 2.5 x 10~ 15 erg s" 1 cm -2 . 
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Table 4. Source selection criteria with the XMM-LSS pipeline 



Classification 


Criteria 


Class 1 extended 


Detection statistic>32, 




Extension statistic>33, 




Extension>5" 


Class 2 extended 


Extension statistic>15, 




Extension>5" 


Point source 


Neither CI nor C2, 




Detection statistic>15 



3.3.2 Extended sources 

Source selection is complicated for extended sources because 
these objects are generally of lower surface brightness (see 
e.g. figure 0, and one does not only have to deal with spu- 
rious detections, but also with contaminating misclassified 
point sources. This task requires special care, keeping in 
mind the very cosmological applications of the survey. Fig- 
ure shows the fraction of extended sources that are de- 
tected by SExtractor in the primary catalogue as a function 
of flux and core radius. Our purpose is then to find a location 
in the Xamin output parameter space where the majority of 
these sources are recovered while keeping the contamination 
rate to a manageable level. 

As a first step, we scanned the detection/extension 
statistic-extension space for the largest uncontaminated ex- 
tended source sample. This is obtained for EXT > 5", 
EXTJ3TAT > 33, and extended fit DETJ3TAT > 32 simul- 
taneously 8 (see Table [5] for the definition of these parame- 
ters). From now on, we will refer to this sample as class 1 
(CI) extended sources. Figure^ illustrates the main CI se- 
lection process in the extension — extension statistic plane. 

Due to our non-contamination requirement, the CI 
sample naturally excludes a number of extended sources 
(generally very low surface brightness or more compact 
sources). A less conservative sample (required by the XMM- 
LSS cluster search in order to detect as many valid sources as 
possible) can be obtained by relaxing the previous criteria to 
EXT > 5", EXTJ3TAT > 15 and no DET.STAT constraint 
(see ngure|SJ . From the number of detections matching these 
criteria in our point source simulations, we can estimate that 
this class 2 sample (C2) contains less than one spurious de- 
tection or misclassified point source every three pointings. 

The mean detection probabilities of extended sources 
within 10' of the FOV are presented for both CI and C2 
samples in figure |8] as a function of count rate and apparent 
core radius. As expected, this probability is higher within 
the C2 sample for low surface brightness and faint compact 
sources. 

Note that detection efficiency is not a simple function 
of source flux as is sometimes as sumed in X-ray cluster sur- 
veys (see e.g. iRosati et al.lll998f) . but it varies significantly 

8 Note that from the definition of EXTJ3TAT (see section |2~3"T1 
and Tabled, it is very unlikely for DET.STAT to be lower than 
32 if EXT_STAT is greater than 33, except in the few rare cases 
where the point source fit crashed 



Table 5. Contamination statistics predicted from the simulations 
for each XMM-LSS pipeline source sample 



Real source type 


Classification 


Nsrc/pointing 


Point-like 


class 1 


0.0 


Point- like 


class 2 


0.17 


Spurious 


class 1 


0.0 


Spurious 


class 2 


0.10 


Spurious 


point-like 


0.53 



when considering different source sizes, and this should be 
modelled to interpret correctly the results of the XMM-LSS. 
This impact of source extent on our detection capacity is il- 
lustrated by figure [T01 where the detection probability as a 
function of luminosity and redshift is shown for the C2 sam- 
ple, assuming a canonical core radius of 180/iy kpc. At high 
redshift, where the angular distance is almost constant, our 
selection process closely resembles a flux limit, while the sen- 
sitivity drops at lower redshift. In this model, we find that 
roughly 90% of the sources down to 3 keV are detected in C2 
at 2=0.5. This number falls to 50% at a redshift of 0.9-1. A 
cluster similar to Coma (~8 keV) would always be detected 
at least as C2 up to a redshift of 1, and have more than 75% 
probability of being detected at z=2. 



3.4 Validation on real data 

To further validate our selection criteria, we processed all 
available XMM-LSS pointings and compared the pipeline 
output with our simulation results. Our X-ray data cur- 
rently consist of 51 XMM-NEWTON pointings; 19 of them 
(G pointings) were obtained from guaranteed-time observa- 
tion as part of a joint L iege/Milan/Saclay program (XMDS, 
IChiappetti et aill2005l) and have 2 x 10 4 s exposure; the re- 
maining 32 are 10 4 s long and were obtained with guest- 
observer time. Among these, three pointings (one G and two 
B) are unusable due to very high background levels (proba- 
ble solar flare contamination). 

3.4-1 Point sources 

In our 30 10 4 s exposure pointings, we obtain on average 
45.8 sources per pointing with DETJ3TAT > 15 for the 
point source fit. As a comparison, taking into account the 
detection probabili ties of figure Eland integrating over the 
log(JV) - log(S) of lMoretti et alT(l2003D between 5 x 10" 16 
and 1 x 10 -11 erg s _1 cm -2 yields on average 47.4 sources per 
pointing. Though the matching is already satisfactory, the 
remaining difference mostly reflects the lack of very bright 
sources in the XM M-LSS area (probably due to cosmic vari- 
ance) identified bv lOandhi et alJ l)200fifl . 

We additionally cross-identified our sources with those 
of the XMDS/VVDS 4cr catalogue JChiappetti et al.ll2005l) 
which results from an alternate analysis of the G fields (that 
mainly uses the standard XMM-SAS procedures and is thus 
suitable only for point sources). We found a very good agree- 
ment with both detected sources and their characteristics. 
Xamin count rates are always within the error bars of XMM- 
SAS emldetect measurements for sources that do not fall 
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on CCD gaps. Moreover, our detection statistic values are 
tightly correlated to their detection probability estimates. 

3.4-2 Extended sources 

Until now, t he XMM-LSS op tical spectroscopy follow-up 
program (see lPierre et all2004 enabled us to confirm about 
60 cluster candidates. This allowed cross-checking the defi- 
nition of our selection criteria obtained from the simulations 
against real data. 

As regards the CI sample, only genuine extended X- 
ray sources are detected, as expected, with no additional 
contamination. The majority («85%) of the extended CI 
sources are clusters, the remainder being nearby galaxies. A 
little contamination, around 0.5 false detections per point- 
ing, is observed in the C2 sample (in the present datasets, 
this amounts to about 50% of the newly-detected sources, 
once the nearby galaxies and CI clusters have been ex- 
cluded). This false detection rate, while still acceptable for 
a survey with optical follow-up, is slightly higher than our 
estimates from simulations, and this is probably the result 
of neglecting the AGN correlation function (thus lowering 
the number of non-deblended close pairs of AGNs). Another 
possibility is that we are detecting some AGNs that are in- 
cluded in cosmological filaments with weak X-ray emission, 
which was not accounted for in the simulations. 

3-4-3 Example runs on z>l clusters 

We ran the pipeline on the archival XMM- NEWTON obser- 
vation 0111790101, for which detection of the highest red- 
shift X-ray cl uster to date, XMM UJ2235. 3-2557, at z ~ 1.4 
was reported (IMullis et alj|2005f) . The observation was per- 
formed in the MEDIUM FILTER and EPIC-PN small win- 
dow mode so that the source, located 7.7' from the optical 
axis, is only observed in the EPIC-MOS detectors. Using the 
full 4.5 x 10 4 s exposure of the pointing (~ 3 x 10 4 s at the 
source position), the cluster is easily identified as a CI ex- 
tended source. We further simulated the XMM-LSS observ- 
ing conditions by analyzing only the first 10 4 s of the obser- 
vation. XMMUJ2235. 3-2557 is still detected with extended 
fit DET_STAT=93.8, EXT_STAT=31.1 and EXT=9.8, as a 
C2 extended source, at the limit of the CI parameter space, 
and would therefore have been detected as CI in the exact 
XMM-LSS observing conditions (i.e. using THIN FILTER 
and with EPIC-PN data available). The ease with which 
this high redshift cluster is detected is mainly due to its 
apparent brightness: ~ 220 (resp. 70) photons were avail- 
able in the 4.5 x 10 4 s (10 4 s) exposure. For comparison, 
we note t hat the z=1.22 cluste r XLSSJ022302. 6-043621 de- 
tected bv iBremer et alJ l|200rt in the XMM-LSS survey is 
classified as a C2 source (EXT=5.4, EXT_STAT=15.4, and 
DET_STAT=51.4) with only 58 photons available for the fit. 



4 THE XMM-LSS SELECTION FUNCTION 

Our simulation programs provide us with tools to compute 
the XMM-LSS selection function. We can derive the detec- 
tion probability as a function of source characteristic for any 
exposure time, background level, and position on the detec- 
tor. 



Figure |7| shows the point-source detection probability 
inside a radius of 10' from the mean optical axis as a function 
of flux. From this, a direct estimate of our mean sky coverage 
can be obtained. 

For a given cosmology, a galaxy cluster of given lumi- 
nosity, temperature, physical extent and redshift can be de- 
scribed by an angular core radius and a detected XMM count 
rate, for which figure[2]gives the detection probability for CI 
and C2. We are therefore now able to properly describe our 
galaxy cluster selection process. 

As an illustration, we compute below the expected red- 
shift distribution of CI and C2 clusters in ACDM cosmology. 

4.1 Cosmological model 

In the following, the cosmological parameters that determine 
the dy namics and content of the universe are set to WMAP 
values JSpereel et alJl2003l) namely: 

H = 71 km s _1 Mpc"\ n m = 0.27, fi A = 0.73, fi b = 0.044, 
n = 0.93 and og = 0.84. 

4-1.1 Mass function 

The shape of the linear power spectrum P(k) is computed 
at z=0 using the initial power law dep e ndenc y in k n and the 
transfer function from lBardeen et alJ dl986l) . The influence 
of baryons on the transfer function was modelled using the 
shape parameter: 

T = ft m h x exp [-Q. b (l + y/2h/Q m )] (8) 

introduced bv lSugivamal il995l) . and the overall spectrum is 
normalized to erg. 

Then at each redshift value on a fine grid: P(k) is evalu- 
ated from its z=0 v alue using the linear growth factor from 
ICarroll et all <1992l) and a(M) is deduced. The comovmg 
halo number densit y as a function of mas s, dn/dm(z), is 
computed using the lSheth &i Tormenl |l999l) mass function. 

This common procedure to determine the halo mass 
function has been largely tested on numerical simulations 
and is known to provide accurate predictions as long as one 
defines the mass of the haloes to be the one included inside 
T2006, the radius that encloses an overdensity of 200 with 
respect to the mean background density. 

4-1-2 Applying the selection function 

Knowledge of the cluster scaling relations is needed to pre- 
dict the temperature and luminosity of these haloes and 
compute XMM count rates. Unfortunately, one generally 
doesn't have access to the mass in r200b from the X-ray data, 
and a halo profile model is required in order to convert the 
mass function to another mass definition. 

For this purpose, we used N FW profiles with sca ling ra- 
dius r s provided by the model of lBullock et al.l i200ll) which 
relates r s to the virial mass of the halo through the con- 
centration parameter c = r V i r /r s 9 . The conversion itself is 

9 Note that we also tested the model of 

lEke. Navarro &T Stcimnctz l200lf) and found a change in 
the redshift distribution of our C1/C2 samples lower than 10% 
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performed using the fo rmulae provided by the appendix of 
iHu fc KravtsovN2003l) . 

The emission- weighted gas temperature is derived using 
the loc al M200-T relation of lArnaud. Pointecouteau fc Praia 
i2005l) . i.e. a slope of a = 1.49, valid for clusters with 
T > 4 keV. At lower temperatures, we added a gradual 
steepening of the correlation (a — 1.85 below 4 keV and 
a = 2 below 2 keV) as indic ated by several recent works (see 
e.g. lFinoguenov et al.ll200lT) . No evolution of the M200-T re- 
lation with redshift was supposed. As an arbitrary condition 
to be considered as a group or cluster, we subsequently re- 
moved all haloes with T < 1 keV. 

Bolometric l uminosities are then com puted using the 
Lx-T relation of lArnaud fc Evrardl dl999T) with no evolu- 
tion. Though there is some evidence that the local Lx-T 
relation also steepens at low T, this seems to be important 
only for T < 1 keV (Ponman et al., in preparation) and is 
consequently ignored. 

The total XMM-NEWTON EPIC count rate is esti- 
mated using an APEC 10 thermal plasma emission model 
iSmith et all 1200 if) with neutral hydrogen absorption as 
modelled bv lMorrison fc McCammonl dl983f) using fixed col- 
umn density of 2.6 10 20 cm -2 (representative of our field) 
folded through the EPIC response matrices for the THIN 
filter in accordance with our observing mode. 

The selection function is finally applied assuming a con- 
stant physical core radius of 180/if kpc. 

4.2 Results 

Using this simplified model and the selection functions ob- 
tained from the simulations, we find that: 

• The C2 sample should contain roughly 12 clusters per 
deg 2 . When the XMM-LSS is complete, it will thus consti- 
tute the deepest X-ray selected galaxy cluster sample over 
a wide area. 

• The CI sample should contain some 7 clusters per deg 2 . 
While this source density is a bit lower than for C2, this se- 
lection process can be applied to the whole of the XMM 
archive, regardless of expensive and time consuming optical 
spectroscopy follow-up, as the sample is effectively uncon- 
taminated. 

The expected redshift distribution for both samples is shown 
in figure |3] Panel (a) of that same figure also gives an idea 
of the luminosity distribution of the C2 sample. 

To validate these results, we compared them with the 
redshift distribution of the observed CI clusters. The sample 
contains 29 sources of which 24 have already been spectro- 
scopically confirmed. Assuming that the 5 missing sources 
(~ 17% of the sample) will not alter significantly the current 
distribution, we find very good overall agreement with our 
prediction (fig-Ep). 

A further interesting result, already outlined in section 
I3.3.2l and fig. |H] is that our selection process doesn't repro- 
duce a flux limited sample, especially at z < 0.6 where the 
change in angular distance is significant (see figure HTk ') . 

This point is further illustrated in fig. Illb where we 
investigate our detection efficiency as a function of cluster 

10 http:/ /cxc. harvard.edu/atomdb/ 



flux. This shows, for the assumed cluster population, the 
a priori impossibility of constructing a flux limited sample 
from our primary catalogues, even accepting a substantial 
contamination level, unless a very high flux limit is set. In 
the present study, SExtractor is run on optimally filtered 
images (retaining only significant structures above 3ct) with 
a very sensitive detection threshold and our results suggest 
that we have reached the limit of the data. This therefore 
challenges any further attempt aiming at defining deep flux 
limited samples with XMM. 

4.3 Limitations of the present model 

Although the matching between this simple model and our 
data (as shown in fig. |5p) is impressive, one should keep in 
mind that some ingredients of the model are still uncertain 
(and this is precisely the purpose of the XMM-LSS to try to 
constrain them). 

In particular, while the evolution of the M200-T rela- 
tion is still unknown, there seems to be indication of a pos- 
itive evolution as predicted by self-simil ar models (see e.g. 
lEttori et alll2004L iMaughan et afll200Efl . However none of 
these studies is probing our range of temperature and red- 
shift, and the influence of non-gravitational processes can 
well alter this behaviour in the group regime, thus the use 
of the simplest non-evolving relations. 

Also, in order to properly take into account the vary- 
ing gas distribution with cluster mass, our assumption of a 
fixed core radius may seem too simple and one would have 
to consider lower values for the groups as indicated by 
observations (e.g. lOsmond fc Ponmanll2004Tl . However such 
data are generally largely dominated by scatter and there is 
currently no well-established scaling relation for these global 
trends. 

Finally, a large fraction of the observed scatter on all 
these scaling relations is intrinsic to the source properties 
and results from the complex process of hierarchical merg- 
ing in cold dark matter cosmologies and feedback from non- 
gravitational activity. 

These are a number of caveats that neeed to be taken 
into account in the interpretation of such a small sam- 
ple of low tempera ture systems. In a forthcoming paper 
llPacaud et"aHl2006ft . where we will present the full cluster 
catalogue, we shall further discuss the effect of the various 
cluster scaling laws and evolution schemes on the dn/dz us- 
ing as input our L — T relation for groups at redshift around 
0.5. 



5 SUMMARY AND CONCLUSIONS 

We have described the procedure that we developed to anal- 
yse the 1 x 10 4 - 2 x 10 4 s XMM images of the XMM-LSS 
survey. The main motivation of this work is the need for 
assembling a sample of clusters of galaxies out to a redshift 
of unity with controlled selection effects, suitable for cosmo- 
logical and evolutionary studies. The resulting pipeline con- 
sequently combines multi-resolution wavelet filtering (MRl) 
to reach the source detection limit, with a subsequent max- 
imum likelihood analysis (Xamin) to characterize the source 
properties. 

The performances of the adopted procedure have been 
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duly tested by means of extensive image simulations: either 
reproducing all instrumental and astrophysical effects, or in- 
jecting extended and point-like sources into already existing 
pointings. This allowed us to investigate the ultimate ca- 
pabilities such as: resolving power, cluster detectability and 
characterization as a function of flux and apparent size, pho- 
tometric accuracy. In this respect, our package constitutes 
a significant impr ovement over the stand ard SAS and the 
XMDS procedure llChiappetti et af1l2005F) . specially for the 
extended source analysis. 

Moreover, the Xamin output parameter space, densely 
scanned by the simulations, provides a powerful means to 
interpret the detected sources. In this way, we are able 
to define two classes of extended sources: the CI class 
which is basically uncontaminated by misclassified point-like 
sources, and the C2 class allowing for some 50% contami- 
nation. This selection process, derived from the simulations, 
has been subsequently checked and validated against the 
current XMM-LSS sample of spectroscopically confirmed 
galaxy clusters. 

Finally, considering a canonical power spectrum com- 
bined with a simple halo model providing n(M, z) and sim- 
ple cluster scaling laws (M-T-L) in a ACDM cosmology al- 
lowed us to predict the dn/dz distribution of the CI cluster 
population. Comparison with our current CI data sample 
shows a very good agreement. From this, we infer that our 
goal of producing a cluster sample with controlled selection 
effects is fulfilled at this stage. An important point to be 
further emphasized is that the resulting sample is not flux 
limited - a concept that is anyway not rigorously applicable 
when dealing with extended sources spanning a wide range 
in flux and size. 

The way the CI class is defined allows us to construct 
a purely X-ray selected cluster sample with a high number 
density of ~ 7/deg 2 in the redshift range [0-1.2]. Moreover, 
an unprecedented density of ~ 12/deg can be obtained with 
the C2 sample which includes objects down to a flux of ~ 5 x 
10 -15 erg s _1 cm -2 . This opens the door to the routine 
construction of unbiased cluster samples from XMM images. 

In the very near future, with the compilation of the 
full XMM-LSS cluster sample over the currently existing 5 
deg 2 , we shall refine the cosmolog ical modelling of the ob- 
served dn/dz iPacaud et al J 12006-1. In particular, we shall 
further investigate the effect of varied evolution schemes of 
the scaling relation, and assumptions on cluster sizes and 
shapes (including scatter on these average trends) . Both as- 
pects are especially relevant for the T < 2 keV groups out to 
z ~ 0.5, a population that the XMM-LSS is for the first time 
unveiling and that constitute the bulk of our sample. Noting 
that the CI cluster sample is almost ide ntical to the sampl e 
for which we can measure a temperature dPierre et al .120051) . 
we shall also be in a position to constrain the evolution of 
the Lx-T relation. 

The combined dn/dz, Lx-T, and shape-modelling will 
provide very useful constrains on numerical simulations, the 
missing link between the theoretical parameter M and the 
observable Lx, and consequently a self-consistent descrip- 
tion of the building blocks of the present day clusters. 
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Figure 1. Examples of simulated 10 4 s XMM-NEWTON images (co-addition of the EPIC cameras). Both contain point sources dis- 
tributed following the X-ray Log(N)-Log(S). Blue circles show the position of simulated clusters. Top: clusters have core radii of 20"and 
on-axis count rates of (from top to bottom) 0.02, 0.01 and 0.03 cts s _1 . Bottom: clusters have core radii of 50"and on-axis count rates 
of (from top to bottom) 0.03, 0.02 and 0.05 cts s — . Displayed clusters are very faint (close to the detection limit) so as to illustrate the 




Figure 2. Wavelet images of fig. simulations, overlayed with SExtractor catalogues. The blue circle shows the central 13' radius of 
the FOV (centered on the mean optical axis) where SExtractor detections arc performed. 
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Figure 3. Raw images of fig. simulations, ovcrlayed with Xamin catalogues. The green 10" radius circles show the detected point 
sources (see fig.||]for the selection criteria); black and magenta circles show the CI and C2 clusters respectively. Clusters not flagged by 
Xamin as CI or C2 are indicated by red dashed-line circles. 
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Figure 4. Extended source characterisation with the XMM-LSS pipeline within 10' of the FOV for 10 4 s exposures. For all plots, vertical 
bars show the standard deviation of measured points. Upper plots show the results for clusters in pointings without point sources, lower 
plots show the results when clusters are injected into real pointings. Left: photometry as a function of on-axis counts and core radius. 
Right: source extension measure as a function of on-axis counts and core radius. In all the plots, only the bins encompassing at least 10 
recovered sources are shown. 




Figure 5. Determination of the XMM-LSS pipeline selection criteria. AGNs are displayed as green diamonds, galaxy clusters as blue 
squares. Red triangles stand for spurious detections. Panel (a): Selection of point sources in the Count Rate - Detection Likelihood plane; 
the solid line at Likelihood=15 defines the point source sample. Panel (b): cluster selection in the Extent - Extension Likelihood plane; 
the solid lines at Extent=5" and Likclihood=15 define the C2 sample; the dashed line shows the extension likelihood criteria of the CI 
sample. 
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Figure 6. Detection probability for extended sources by SExtractor; this can be considered as the ultimate sensitivity with 10 4 s XMM 
images, but the contamination is maximal. 
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Figure 7. Point source analysis with the XMM-LSS pipeline within 10' of the FOV for 10 4 s exposures. Panel (a): detection probability 
as a function of count rate. Panel (b): photometry, dashed and dotted-dashed lines show respectively the intrinsic ltr and 3c scatter 
expected from Poisson noise. 
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Figure 8. Extended source detection efficiency of the XMM-LSS pipeline in 10 4 s exposures as a function of source counts and core 
radius inside 10' of the FOV. Panel (a): CI sample. Panel (b): C2 sample. 
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Figure 9. Cosmological expectations of the CI and C2 samples for sources with T > 1 keV. Panel (a): Luminosity and redshift 
distribution of the C2 sample. Panel (b): redshift distribution of the observed CI sources (29 sources, 24 with redshifts) compared to the 
ACDM expectations. 
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Figure 10. Probability of detecting a cluster located inside the central 10' of the FOV as a C2 source as a function of its redshift and 
luminosity given our cosmological model within 10 4 s pointings. An indicative flux limit of 2 X 10 -14 erg.s _1 .cm — 2 is shown by the thick 
dashed line. 




0.6 0.8 1.0 1.2 




Figure 11. Comparison of our source selection process with the common assumption of a flux limited sample for sources with T > 1 keV. 
Panel (a): expected dn/dz for class 1 (blue) and class 2 (red) compared to flux limited surveys; on average, we detect higher redshift 
clusters than flux limited surveys with the same source density. Panel (b): redshift distribution of the sources not detected by the first 
pass (MR/1 + SExtractor) above several flux limits, assuming the source population generated by our simple cosmological model (we 
miss the low end of the luminosity function). 



